8 research outputs found
Effective tuning of regression models using an evolutionary approach: a case study
Hyperparameters enable machine learning algorithms to be customized for specific datasets. Choosing the right hyperparameters is a challenge often faced by machine learning practitioners. With this research, tuning of hyperparameters for regression models was explored. Models predicting house prices in King County were created using a detailed suite of regression algorithms. Traditional approaches, and evolutionary algorithms, for improving model accuracy were evaluated. A variety of feature selection methods and hyperparameter tuning using grid search, random search and pipeline optimization were also studied as part of the traditional approaches. Furthermore, evolutionary algorithms were applied to model optimization. In this paper, it is shown that an evolutionary approach, implemented with TPOT, achieves the highest accuracy for a regression model based on the King County dataset. Regarding metrics, combining the RMSE and metrics is shown to be an effective means of determining model accuracy. Finally, greedy feature selection performed best when a variety of feature selection methods are compared
Open-source neural architecture search with ensemble and pre-trained networks
The training and optimization of neural networks,
using pre-trained, super learner and ensemble approaches is
explored. Neural networks, and in particular Convolutional
Neural Networks (CNNs), are often optimized using default
parameters. Neural Architecture Search (NAS) enables
multiple architectures to be evaluated prior to selection of the
optimal architecture. Our contribution is to develop, and make
available to the community, a system that integrates open
source tools for the neural architecture search (OpenNAS) of
image classification models. OpenNAS takes any dataset of
grayscale, or RGB images, and generates the optimal CNN
architecture. Particle Swarm Optimization (PSO), Ant Colony
Optimization (ACO) and pre-trained models serve as base
learners for ensembles. Meta learner algorithms are
subsequently applied to these base learners and the ensemble
performance on image classification problems is evaluated. Our
results show that a stacked generalization ensemble of
heterogeneous models is the most effective approach to image
classification within OpenNAS
Neural architecture search using particle swarm and ant colony optimization
Neural network models have a number of hyperparameters
that must be chosen along with their architecture. This can be a heavy
burden on a novice user, choosing which architecture and what values
to assign to parameters. In most cases, default hyperparameters and architectures are used. Significant improvements to model accuracy can
be achieved through the evaluation of multiple architectures. A process
known as Neural Architecture Search (NAS) may be applied to automatically evaluate a large number of such architectures. A system integrating open source tools for Neural Architecture Search (OpenNAS), in the classification of images, has been developed as part of this research. OpenNAS takes any dataset of grayscale, or RBG images, and generates Convolutional Neural Network (CNN) architectures based on a range of metaheuristics using either an AutoKeras, a transfer learning or a Swarm Intelligence (SI) approach. Particle Swarm Optimization (PSO) and Ant Colony Optimization (ACO) are used as the SI algorithms. Furthermore, models developed through such metaheuristics may be combined using stacking ensembles.
In the context of this paper, we focus on training and optimizing CNNs
using the Swarm Intelligence (SI) components of OpenNAS. Two major
types of SI algorithms, namely PSO and ACO, are compared to see which
is more effective in generating higher model accuracies. It is shown, with
our experimental design, that the PSO algorithm performs better than
ACO. The performance improvement of PSO is most notable with a more
complex dataset. As a baseline, the performance of fine-tuned pre-trained
models is also evaluated
Enhanced neural architecture search using super learner and ensemble approaches
Neural networks, and in particular Convolutional Neural Networks (CNNs), are often optimized using default parameters. Neural Architecture Search (NAS) enables multiple architectures to be evaluated prior to selection of the optimal architecture. A system integrating open-source tools for Neural Architecture Search (OpenNAS) of image classification problems has been developed and made available to the open-source community. OpenNAS takes any dataset of grayscale, or RGB images, and generates the optimal CNN architecture. The training and optimization of neural networks, using super learner and ensemble approaches, is explored in this research. Particle Swarm Optimization (PSO), Ant Colony Optimization (ACO) and pretrained models serve as base learners for network ensembles. Meta learner algorithms are subsequently applied to these base learners and the ensemble performance on image classification problems is evaluated. Our results show that a stacked generalization ensemble of heterogeneous models is the most effective approach to image classification within OpenNAS
Transformers for low-resource languages: is féidir linn!
The Transformer model is the state-of-the-art in Machine Translation. However and in general and neural translation models often under perform on language pairs with insufficient training data. As a consequence and relatively few experiments have been carried out using this architecture on low-resource language pairs. In this study and hyperparameter optimization of Transformer models in translating the low-resource English-Irish language pair is evaluated. We demonstrate that choosing appropriate parameters leads to considerable performance improvements. Most importantly and the correct choice of subword model is shown to be the biggest driver of translation performance. SentencePiece models using both unigram and BPE approaches were appraised. Variations on model architectures included modifying the number of layers and testing various regularization techniques and evaluating the optimal number of heads for attention. A generic 55k DGT corpus and an in-domain 88k public admin corpus were used for evaluation. A Transformer optimized model demonstrated a BLEU score improvement of 7.8 points when compared with a baseline RNN model. Improvements were observed across a range of metrics and including TER and indicating a substantially reduced post editing effort for Transformer optimized models with 16k BPE subword models. Bench-marked against Google Translate and our translation engines demonstrated significant improvements. The question of whether or not Transformers can be used effectively in a low-resource setting of English-Irish translation has been addressed. Is féidir linn - yes we can
Machine translation in the Covid domain: an English-Irish case study for LoResMT 2021
Translation models for the specific domain of translating Covid data from English to Irish were developed for the LoResMT 2021 shared task. Domain adaptation techniques, using a Covid-adapted generic 55k corpus from the Directorate General of Translation, were applied. Fine-tuning, mixed fine-tuning and combined dataset approaches were compared with models trained on an extended in-domain dataset. As part of this study, an English-Irish dataset of Covid related data, from the Health and Education domains, was developed. The highest-performing model used a Transformer architecture trained with an extended in-domain Covid dataset. In the context of this study, we have demonstrated that extending an 8k in-domain baseline dataset by just 5k lines improved the BLEU score by 27 points
gaHealth: An EnglishâIrish bilingual corpus of health data
Machine Translation is a mature technology for many high-resource language pairs. However in the context of low-resource languages, there is a paucity of parallel data datasets available for developing translation models. Furthermore, the development of datasets for low-resource languages often focuses on simply creating the largest possible dataset for generic translation. The benefits and development of smaller in-domain datasets can easily be overlooked. To assess the merits of using in-domain data, a dataset for the specific domain of health was developed for the low-resource English to Irish language pair. Our study outlines the process used in developing the corpus and empirically demonstrates the benefits of using an in-domain dataset for the health domain. In the context of translating health-related data, models developed using the gaHealth corpus demonstrated a maximum BLEU score improvement of 22.2 points (40%) when compared with top performing models from the LoResMT2021 Shared Task. Furthermore, we define linguistic guidelines for developing gaHealth, the first bilingual corpus of health data for the Irish language, which we hope will be of use to other creators of low-resource data sets. gaHealth is now freely available online and is ready to be explored for further research
Human evaluation of EnglishâIrish transformer-based NMT
In this study, a human evaluation is carried out on how hyperparameter settings impact the quality of Transformer-based Neural Machine Translation (NMT) for the low-resourced EnglishâIrish pair. Sentence Piece models using both Byte Pair Encoding (BPE) and unigram approaches were appraised. Variations in model architectures included modifying the number of layers, evaluating the optimal number of heads for attention and testing various regularisation techniques. The greatest performance improvement was recorded for a Transformer-optimized model with a 16k BPE subword model. Compared with a baseline Recurrent Neural Network (RNN)model, a Transformer-optimized model demonstrated a BLEU score improvement of 7.8 points. When benchmarked against Google Translate, our translation engines demonstrated significant improvements. Furthermore, a quantitative fine-grained manual evaluation was conducted which compared the performance of machine translation systems. Using the Multidimensional Quality Metrics (MQM) error taxonomy, a human evaluation of the error types generated by an RNN-based system and a Transformer-based system was explored. Our findings show the best-performing Transformer system significantly reduces both accuracy and fluency errors when compared with an RNN-based model